-
Notifications
You must be signed in to change notification settings - Fork 199
always retry on specific errors from azcopy #1196
always retry on specific errors from azcopy #1196
Conversation
src/agent/onefuzz/src/az_copy.rs
Outdated
if attempt_count >= RETRY_COUNT { | ||
Err(backoff::Error::Permanent(x)) | ||
Err(err) => { | ||
if !should_retry_without_attempt_increment(&err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We unconditionally increment attempt_count
above, and this function decides whether or not we increment failure_count
. I'm guessing the names drifted during dev.
What if we just rename the predicate to should_always_retry()
, which matches the constant, is a non-negated phrasing of a boolean, and lets the caller interpret appropriately (without attempt
/failure
baggage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this change, the # of attempts and # of failures were the same.
Originally when developing this, I used a single counter. However, in considering the information given to the user, I thought it useful to track these as different values to understand this difference.
Consider a single counter, where the source changes during upload (triggering the azcopy failure) 3 times.
azcopy sync attempt 1 failed
azcopy sync attempt 1 failed
azcopy sync attempt 1 failed
With two counters, this would look like:
azcopy sync attempt 1 failed (failure 1)
azcopy sync attempt 2 failed (failure 1)
azcopy sync attempt 3 failed (failure 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand the change, and it seems really reasonable.
My suggestion is just that the predicate name is now confusing, since (as your example illustrates), we always increment attempt
, but only increment failure
conditionally (depending on if the error has the "always retry" status).
When I wrote "without attempt
/failure
baggage", I just meant "in the name of the predicate function". We should keep that distinction.
As indicated in #1195, there isn't an ergonomic mechanism for fuzzers & the agent to share knowledge of when an
azcopy sync
is occurring. As such, errors that are indicative of this difficulty should be retried automatically without impacting the "how many times to retry on error" counter.